Expand blob prefetch noop cache to N entries#2004
Open
tyrielv wants to merge 2 commits into
Open
Conversation
Replace the single-entry LastBlobPrefetch.dat cache with a multi-entry BlobPrefetchCache.dat that stores up to N entries (default 100), keyed by SHA256 hash of (files, folders, hydrate) and storing the commit ID. This avoids redundant diff+download work when users cycle through a small set of prefetch patterns (e.g. 3 different file/folder combos), which previously caused 2/3 of calls to miss the single-entry cache. Changes: - BlobPrefetcher: replace flat 4-key dictionary with hash-keyed cache - BlobPrefetcher.ComputeCacheKey: canonical, order-independent hashing - BlobPrefetcher.SavePrefetchArgs: single-entry eviction when at capacity - PrefetchVerb: read gvfs.prefetchCacheSize config (0=disabled, max 1000) - PrefetchVerb: use BlobPrefetchCache.dat instead of LastBlobPrefetch.dat - 12 unit tests covering key determinism, order independence, cache hit/miss, multi-entry support, and null/empty edge cases Assisted-by: Claude Opus 4.6 Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
The multi-entry prefetch cache persists across ordered tests, causing cache hits where the tests expect fresh prefetch work. Delete BlobPrefetchCache.dat in [SetUp] so each test starts with a clean cache. Assisted-by: Claude Opus 4.6 Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the single-entry
LastBlobPrefetch.datcache with a multi-entryBlobPrefetchCache.datthat stores up to N entries (default 100), keyed by SHA256 hash of (files, folders, hydrate) and storing the commit ID.Problem
The most common blob prefetch use case cycles through ~3 different prefetch calls with different file/folder patterns. Since only 1 entry was cached, 2 of the 3 always missed and re-ran the full pipeline (diff + existence checks + downloads) even when nothing changed.
Changes
BlobPrefetcher.cs: Replace flat 4-key dictionary with hash-keyed multi-entry cache. AddComputeCacheKey(canonical, order-independent SHA256 hashing) and single-entry eviction when at capacity.PrefetchVerb.cs: Readgvfs.prefetch-cache-sizeconfig (default 100, 0 disables, max 1000). UseBlobPrefetchCache.datinstead ofLastBlobPrefetch.dat.BlobPrefetcherTests.cs: 12 unit tests covering key determinism, order independence, cache hit/miss, multi-entry support, and null/empty edge cases.Configuration
Backward Compatibility
LastBlobPrefetch.datis simply ignored — the first prefetch after upgrade is a cache miss (acceptable).